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In this paper, the asymptotic distributions of estimators for the regularized functional canonical 
correlation and variates of the population are derived. The method is based on the possibility 
of expressing these regularized quantities as the maximum eigenvalue and the corresponding 
eigenfunctions of an associated pair of regularized operators, similar to the Euclidean case. The 
known weak convergence of the sample covariance operator, coupled with a delta-method for 
analytic functions of covariance operators, yields the weak convergence of the pair of associ- 
ated operators. From the latter weak convergence, the limiting distributions of the canonical 
quantities of interest can be derived with the help of some further perturbation theory. 
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1. Introduction 

This paper deals with the asymptotic distribution theory of functional canonical corre- 
lations and their variates. Although tailored to these particular problems, the methodol- 
ogy is of a generic character and may also apply to questions regarding the asymptotic 
distribution of other statistics used in functional data analysis. The problem will be for- 
mulated in a general Hilbert space setting where the Hilbert space is tacitly assumed to 
be infinite-dimensional and separable. 

In this infinite-dimensional case, some difficulties regarding the definition of the sample 
canonical correlation have already been observed in Leurgans et al. (1993). The authors 
of that paper argue that some kind of smoothing or regularization is indispensable when 
dealing with the sample canonical correlation. These difficulties are essentially due to 
the fact that the sample covariance operator has a so-called finite-dimensional kernel 
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(Riesz and Sz.-Nagy (1990)), while acting on an infinite-dimensional space. Leurgans et 
al. (1993) realize smoothing by introducing a roughness penalty term. Although there 
is a connection between Tikhonov regularization of inverse operators (employed in this 
paper) and the use of penalty terms, the relation with the roughness penalty cannot be 
established within the present context of our paper. He et al. (2004) apply dimension 
reduction/augmentation at the level of the actual data and base the empirical canonical 
correlation on these modified data. This approach differs considerably from ours, which is 
based on regularization of the canonical correlation itself. The results in He et al. (2004) 
are for fixed sample size and the asymptotics in Leurgans et al. (1993) remain restricted 
to consistency. 

In 2006 it has been observed that the population canonical correlation, although well 
defined in principle, is, in general, a supremum of a certain functional, rather than a 
maximum, so that a maximizer (i.e., a pair of canonical variates) may not always exist 
in the ambient Hilbcrt space. Another deficiency is that, even if the canonical correlation 
corresponds to a maximum and canonical variates do exist, these quantities cannot be 
interpreted as the maximum eigenvalue and corresponding eigenvector of a pair of associ- 
ated operators, as is true in the Euclidean setting. The development in 2006 shows that all 
of these deficiencies of the population canonical correlation can be remedied if a modifica- 
tion is employed, based on regularization of the inverses of the operators involved. Also, 
some relations between the actual population quantities and their regularized versions 
are established in that paper. 

The present approach to finding the asymptotic distribution of the regularized sample 
canonical correlation and its variates hinges to a great extent on the interpretation of 
both the regularized sample and the regularized population quantities as spectral char- 
acteristics of associated pairs of operators. In Section 4 of this paper, the asymptotic 
distribution of a regularized version of the sample canonical correlation and its variates 
will be derived. In the Euclidean case, where regularization is not needed, this approach 
has been pursued in Ruymgaart and Yang (1997), exploiting certain results in Watson 
(1983). 

One of the main tools needed to derive the desired asymptotics is a delta-method 
for analytic functions of certain random operators (more specifically, sample covariancc 
operators). This delta-method might be of independent interest and is considered in Sec- 
tion 3. It is based on the existence of a Frechet derivative of an analytic function of a 
compact, strictly positive Hermitian operator, tangentially to the space of all compact 
Hermitian operators. Because we cannot make the simplifying assumption that the in- 
crements commute with the operator at which the function is evaluated, the expression 
for the Frechet derivative requires an extra correction term. The delta-method yields 
the asymptotic distribution of the associated operators, from which the asymptotics of 
their eigenvalues and eigenvectors can be derived in a similar manner as in Dauxois et 
al. (1982). 

As has been observed above, without regularization, the population canonical variates 
do not, in general, exist and, consequently, it seems appropriate to maintain a fixed level 
of regularization for suitable asymptotics. Mathematically, a fixed level of regularization 
leads to root-sample-size asymptotics. When the regularization parameter tends to zero, 
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however, this rate will depend on the (typically unknown) eigenvalues of the covariance 
operator. 

In Section 2, some basic notation and definitions are introduced. For practical im- 
plementation of the results of Section 4, the estimation of unknown parameters will be 
needed, an issue addressed in Section 5. An example and some further comments are 
given in Section 6. The mathematical results for perturbation of compact, positive Her- 
mitian operators that, in particular, yield the Frechet derivative are reviewed without 
proof in the Appendix. 

2. Basic notation, definitions and assumptions 

Let (f2, 5", P) denote a probability space, H an infinite dimensional, separable Hilbcrt 
space with inner product (•,•), norm || • | and a-ficld of Borcl sets 23 ra , and let X : Q — > H 
be a random element in H, that is, an (3", 23jj)-measurable mapping. Throughout, it will 
be required that 

E||X|| 4 <oo. (2.1) 

Under this condition, the mean EX = [i € H exists, meaning that (Laha and Rohatgi 
(1979)) 

E(f,X) = (f,fx) V/eH. (2.2) 

Under assumption (2.1), the covariance operator £ of X also exists. It is known to be 
uniquely determined by the relation 

E{f,X-ii)(X-n,g)=E(f,({X-ii)®(X-iJi))g) = {f,Eg) V/, ? ei, (2.3) 

where "®" denotes the tensor product in H. We will also write 

£ = E(X- / u)<g) (X-/J,). (2.4) 

Such a covariance operator is nonnegative Hermitian and has finite trace E||A|| 2 , so it is 
also compact. We will therefore assume, without real loss of generality, that 

£ is strictly positive, that is, (/, £/) > V/ ^ (2.5) 

and hence that £ is injective. It is well known that £ has spectral representation 

oo 

£ = ]TA fe P fe , (2.6) 

fe=i 

where Ai > A2 > • • • I are the eigenvalues of £ and Pi, P2, ■ ■ ■ the projections onto the 
corresponding finite-dimensional eigenspaces. 
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Let L denote the Banach space of all bounded linear operators that map H into itself. 
The ordinary operator norm in C will be denoted by || ■ || without confusion. Of partic- 
ular importance in this paper, however, is the subspace £(HS) of all Hilbert- Schmidt 
operators. This space becomes a separable Hilbert space when it is endowed with the 
inner product 

oo 

(U,V)m = J2( Ue k,Ve k ) > U,V€£(RS), (2.7) 

k=l 

where e\ , e 2 , . . . is an orthonormal basis of H. This inner product does not depend on the 
choice of basis; see Lax (2000). The norm and tensor product in £(HS) will be denoted 
by || • ||hs and ® H s, respectively. 

The space £(HS) is important for the study of weak convergence of the sample covari- 
ance operator. At this point, let us simply note that E G £(HS) and that [X — fi) ® (X — ji) 
is a random clement in £(HS). As a random clement in this Hilbert space, it has its own 
covariance operator; this operator exists due to condition (2.1) and can easily be seen to 
equal (cf. (2.3) and (2.4)) 

E{(X - n) <g> (X - n) - £} <g> HS {(X -n)®(X-n)-Ti} 

= E{(X - n) <g> (X - n)} ® HS {{X - n) <g> (X - n)} - E ® HS E (2.8) 

= ^HS- 

Next, let us suppose that Hi and H2 are two closed subspaces of H such that 

H = Hi © H 2 , Hi±H 2 . (2.9) 

Denote the orthogonal projection of H onto Hj by H,-, let Xj = n^-A, /ij = Hj/i and let 
Ejfc denote the restriction of E to H^ and Mj, that is, 

E^ = n,En fe , j,k = 1,2. (2.10) 

Because the Tlj arc bounded and E is Hilbert-Schmidt (and hence compact), each oper- 
ator Ejfc is still Hilbert Schmidt (and hence compact). In addition, the E^- are strictly 
positive Hcrmitian. Let us also note that 

£j 2 = (ni£n 2 )* = n 2 Eni = £ 2 i. (2.11) 

Similarly to (2.6), E 3 -j has a spectral representation of the form 

oo 

fe=i 

where Xji > Xj2 > . • • | are the eigenvalues of Ejj and Pji, P j2 , . . . the projections onto 
the corresponding finite dimensional eigenspaces. 
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Suppose, now, that we are given a random sample X\ , X2, ■ ■ ■ , X n of independent copies 
of X . The usual estimators of pi and E are 

n 1 n 

I=-Vl„ %=-Y(X i -X)®(X i -X), (2.13) 

i=l i=l 

respectively. This operator E has all of the properties of E, including its being of Hilbcrt- 
Schmidt type, except that it has a so-called finite-dimensional kernel (Riesz and Sz. Nagy 
(1990)) with a range of dimension at most n. Hence, this operator can never be injective, 
not even when E is (as we assume). The fact that E is not injective is the source of 
difficulties associated with defining the sample principal canonical correlation that turns 
out to always be 1, as has been pointed out by Leurgans et al. (1993). These authors 
state that regularization is indispensable in the sample case. 

The canonical correlation concept considered here can also be viewed from the per- 
spective of Hilbert-space-indexed processes (e.g., Parzen (1970)) corresponding to H inner 
products involving the random elements Xj = HjX, j = 1,2. Thus, it has direct ties to 
(functional) analysis of variance and discriminant analysis that parallel the relationship 
between these methods for classical multivariate analysis (e.g., Kshirsagar (1972), Eu- 
bank and Hsing (2006) and Shin (2006)). The necessity of regularization in this context 
follows from results in Bickel and Levina (2004), while the use of regularized discriminant 
analysis methods with functional data has been explored by Hastic et al. (1995). 

2006 argue that regularization is expedient, even when the population canonical corre- 
lation is considered, because, without it, canonical variates may not exist and the relation 
with the spectral characteristics of an associated pair of operators is lost. Hence, in this 
paper, both the sample and the population canonical correlation will be regularized and 
compared at the same fixed, but arbitrary, level of the regularization parameter. 

In order to specify the regularization that will be employed here, let us replace E with 
al + E and E with al + E, where I is the identity operator and a > 0. Let us also replace 
Ejfc and Ejfe with 

n, (al + E)H fe = j M + E «)' j . = k k > (2.14) 

n i (a/ + E)n fc = K a/j ' +S «)' J = k > (2.15) 
lEjfc, j^k, 

respectively, where Ij = Hj is essentially the identity operator restricted to Mj . Let us 
write H" = Hi\{0}, H§ = H 2 \{0} for brevity. 

Definition 2.1. Fix a > 0. The regularized squared principal canonical correlation 
(RSPGC) for the population is defined as 

2 2/ \ (/i,Ei 2 / 2 ) 2 , . 

P =p {a)= max (2.16) 
hen° (/i,(<xTl + E u )/i)(/ 2 , (ah + E22J/2) 

/2£H0 
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Its sample analogue is p 2 = p 2 (a), obtained from (2.16) by replacing Ejfc with Pairs 
of maximizers will be respectively denoted by f* = f* a , f£ = f£ a f or ^ e population and 
by f\ = fi a , fi = f2a for the sample. The corresponding canonical variates are 

(Xj;), (XJ*), j = 1,2. (2.17) 

Warning. Since, throughout the sequel, a > will be arbitrary, but fixed, the depen- 
dence on a is henceforth suppressed in the notation. 



Several properties have been shown in 2006, in particular, that, for a > 0, a maximizer 
always exists. This can, in fact, be seen as an implication of the following result of that 
paper. Define the operators (a > 0) 

R x = (ah + E u )- 1 / 2 E 12 (a/ 2 + ^(^^(a/i+En)- 1 / 2 , (2.18) 
R 2 = {ah + E 22 r 1/2 £ 2 i(a/i + Sii)" 1 S 12 (a/ 2 + S^)" 1 / 2 (2.19) 

and their sample analogues R\ and i? 2 . Since all factors defining these operators are 
bounded, with Si 2 and S 2 i or their sample analogues even Hilbert-Schmidt (and hence 
compact) it follows that these operators are also Hilbert-Schmidt (and hence compact). 
It will be assumed that 

{i?i and i? 2 have a largest eigenvalue with one-dimensional eigenspace , . 
generated by f{ and /^respectively, where ||/*|| = ||/*|| = 1. ^ > 



Theorem 2.1. For a > 0, we have 

p 2 = largest eigenvalue of Rj = (f*,Rjf*) (2.21) 
for j = 1,2. A similar result holds true for p 2 . 

The maximizers or canonical variates are essentially unique if the eigenspaces corre- 
sponding to this maximal eigenvalue are one-dimensional. The same properties hold true 
for the sample analogue. 

3. A delta-method for analytic functions of the 
sample covariance operator 

Assuming (2.1), Dauxois et al. (1982) have shown the fundamental result 

V^(S-£)4g, as twoo, in £(HS), (3.1) 
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where Q is a zero-mean Gaussian random element in the Hilbert space £(HS) with 
covariance operator 

E£®hs£ = £hs, (3.2) 
as defined in (2.8). The continuous mapping theorem immediately yields that 

\Zn(Ejk —T,jk)— >ILjGIlk = Gjk, as n—> oo, in £(HS). (3.3) 
Let D C C be the open domain in the complex plane defined by 

D=<z£C: min \z — x\ < ha 

I 0<a;<||E|| 1 

where a > is the rcgularization parameter. This domain can be used for all the specific 
functions we need to consider. It seems worthwhile, however, to first consider an arbitrary 
function 

tp:D-^<C, analytic on D. (3-5) 

As in the Appendix, let Ch denote the class of all compact Hcrmitian operators on H and 
Ch the class of all bounded Hcrmitian operators. Let us consider the operator tp(E + CP) 
in Ch, for CP in Ch with ||CP|| < |a. This operator- valued function has a Frechet derivative 
at S, tangentially to Ch, denoted by tp'j, and given by (A. 6). This operator tp'^ :Ch — > C-h 
is bounded in the usual operator norm. 

If £# (HS) C Ch is the subspace of all Hermitian Hubert-Schmidt operators, we even 
have 

^:£ H (HS)^£ff(HS), bounded in j| • ||hs- (3.6) 
To see this, take 7 £ £#(HS) and observe that 

OO oo 

w^nh = e iiv^ii 2 ^ ii^ii 2 E ii^ii 2 = ii^ihi^Hhs < «>, (3.7) 

k=l k=l 

exploiting the boundcdness of <p'^, in the usual operator norm. It is well known (Lax 
(2000)) that 

||T||<||T|| H s, Te£(HS). (3.8) 

We are now ready to establish a "delta-method" for random operators. For random 
matrices, the result follows from Watson (1983) and can be found in Ruymgaart and 
Yang (1997). 

Theorem 3.1. If (2.1) is satisfied, it then follows that 

y/n{(p(E) — ¥>(£)} — as n — >• oo, in £(HS), (3-9) 
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where TL is the zero-mean Gaussian random element o/£(HS) given by 

h = = £ Lp' {\.j)p.jGPj + £ £ ^Zf^ P^ (3-io) 

with Q given in (3.1). 

Proof. Let us consider 7 = E — E as a random perturbation (cf. Dauxois et al. (1982), 
Watson (1983)) and note that, by (3.1) and (3.8), we have ||?|| < ||T||hs = O p (n- 1 / 2 ) as 
n — ► oo. This implies that, for numbers n -1 / 2 -C e n *C n -1 / 4 we have 

P(f2„)=P{a;Gfi:||y(w)|| < e„} -»■ 1 asn^oo. (3.11) 
According to Theorem A.l and (3.12), we have, for n sufficiently large, 

V^ME) - p(E)} = >/n{p(E) - <p(T,)}l Qn + Vn{v(E) - ¥>(E)}ln« 

= ^{^ + 0(||y|| 2 )}lu„+ P (l) (3-12) 
= ^(^(E-E))+ P (1). 

The results in the theorem follow from (3.12) by applying (3.1) once more, in conjunction 
with (3.6) and the continuous mapping theorem. □ 

Remark 3.1. The double sum in (3.1) is, in fact, a correction term that is needed 
because we may not assume that the "increments" IP = E — E and E commute; see also 
Remark A.l. 

In order to obtain asymptotic distributions for functional canonical correlations and 
variatcs, Theorem 3.1 will be employed for the specific functions 

ip p (z) = {a + z)- p / 2 , zeD, p=l,2. (3.13) 

These functions are indeed analytic on D. For brevity, let us simply write (p' p j for the 
Frechet derivative evaluated at Ejj. It is immediate from (2.9) that ||Ejj|| < ||E|| and 
therefore the domain D can still be used for Hjj. The following corollary is immediate 
from these remarks, (3.3) and Theorem 3.1. 

Corollary 3.1. With tp p as in (3.13), we have, for j = 1,2, 

Vn{<p P (Zjj) - V P CZjj)}-Kp' pd Gjj, (3.14) 
where the limit is a zero-mean Gaussian random element in £(HS) and, more explicitly, 

VpjGii = -f (a + \ ik )(p+V/ 2PjkGjjPjk 

i"">1 ^ 
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(3.15) 



+ EE 



(Xjn ~ X jm )(a + X jm ) p/2 (a + X jn )P/ 




4. Asymptotics for the sample RSPCC and variates 

The basic ingredients for the asymptotic distribution of the sample RSPCC and its 
variates are the weak limits of the associated operators i?i and i?2 (cf. (2.17) and (2.18)) 
from which these quantities are derived. These limits follow rather routinely with the 
help of Corollary 3.1. It has already been observed that Rj,Rj G £(HS) for j = 1,2. 
Let us introduce the following zero-mean Gaussian elements of £(HS): 



&11 = (<< 5 i,l^ll)^'12 ( /'2(S22)S21 l ^l(Sii) 

3^12 = (2x1)^12^2(^22)221 ^i(En), 

&13 = 9 5 l(2ll)2i2(^ 2j2 ^22 ) 221<Pl(Sii) 
#14 = 9 5 l(2ll)2i2(^2 ( 222)^21<Pl(2ii), 
#15 = 9 5 l(2ll)2 12 < < f 2 (E22 ) 221</3i il (5ll) 



(4.1) 
(4.2) 
(4.3) 
(4.4) 
(4.5) 



5 




(4.6) 



and, similarly, 



#21 = (< < 5i ! 2^22)2 2 i( ) !32(2ll)i;i2^l(S2 2 ) 
#22 = <y9l(2 2 2)£21</ ; '2(2ii)i;i2^l(222), 
#23 = Vl(222)22x(V2,l^ll)2l2¥'l(222) 
#24 = V 3 l(222)221<^2(2ii)5i2<^l(2 2 2), 
#25 = </ 3 l(222)2 2 iy'2(2ii)I]i2^' 1 2(^22) 



(4.7) 
(4.8) 
(4.9) 
(4.10) 
(4.11) 



5 




(4.12) 



Theorem 4.1. Let (2.1) be satisfied. We have 

as 00, in £(HS) for j = 1,2. 



(4.13) 
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Proof. It suffices to prove (4.13) for j = 1. The left-hand side of (4.13) can be decom- 
posed as Ylj=i -^lji where, for instance, 

%i = v^{Vi(Sn) - ¥'i(Sii)}E 12 ^ 2 (S 22 )S 21 ^ 1 (g 11 ). (4.14) 

It follows from (3.14) that the first factor in (4.14) equals ip[ 2G22 Relation (3.3) 

and the continuity of the functions in (3.14) imply that the product of the remaining 
four factors equals £ 2 i¥> 2 (£ii)I!i2<£>i(£22) + ^p(l)- In combination, these results yield 
that 3lxi = flu + <^p(l)- I n a similar manner, one can deal with JI12, ■ ■ ■ , ^15. Eventually, 
this produces y/ri(Ri — Ri) = Y^fj=i •^■ij + pCO an d wc are done. □ 

To establish (4.13), we have exploited the delta-method of (3.14), based on the 
Frechet derivative, in order to deal with the factors in the product defining Rj. Once the 
limiting distributions of the random operators have been established, we may proceed 
as in Dauxois et al. (1982) to find the asymptotic distributions of eigenvalues and eigen- 
vectors. For completeness, the required perturbation results in the infinite-dimensional 
situation are briefly summarized in the Appendix and proofs of the two main theorems 
below are included. 

First, some more notation will be needed. The compact operators Rj and Rj are 
nonnegative Hermitian and have spectral representations 

00 00 

Rj = PjkQjk, Rj = PjkQjk, j = 1, 2, (4.15) 

k=l k=l 

where pj\ > pji > • • • I and pji > pj2 > ■ ■ ■ J. are the distinct eigenvalues and Qjk, 
Qjk the orthogonal projections onto the corresponding finite dimensional eigenspaces. 
Assumption (2.20) implies that 

Pn=p\ <2ii=/;®/;, j = i,2. (4.16) 

We also have, by Definition 2.1 and Theorem 2.1, that 

Pji=f, 3 = 1,2. (4.17) 



The operators 



will also be needed. 



A j=J2 ^ i = l,2, (4.18) 

trL pji - p 3 k 



Theorem 4.2. Let (2.1) and (2.19) be satisfied. The sample RSPCC then has a normal 
distribution in the limit: 

y/n{p 2 - p 2 )^N{0,a 2 ) as twoo, (4.19) 



The delta method for analytic functions 



1189 



where 

a 2 =E (%/;,//) 2 , j = l,2. (4.20) 

Proof. The proof is in the same vein as that of Theorem 3.1. However, let us now 
consider the random perturbation 7 = Rj — Rj and define £l n for the same e„ , but with 
3 as above. In the present situation, it is (4.13) that guarantees that P(fi n ) — * 1 as 
n — ► oo. 

It follows from Theorem A. 2 that 

Qjiln„=I*®f*ln n (4.21) 
for n sufficiently large. Application of Theorem A. 3 yields 

Vn{f>ji - Pji) = Vn(pji - fti)ln„ + Vn(p- p)ln= 

= \^(T/;,/;)iu n + o(i|T|| 2 i fi j+ P (i) 

(4.22) 

= (^(^-i^/;,/;>+p(i) 

Because of (4.16) and (4.17), the expression on the left in (4.22) equals the one on the 
left in (4.19), so the theorem follows. □ 

Theorem 4.3. Assuming the validity of (2.1) and (2.19), we have 

Vn(f* - f^-^AjOijf*, as n^oo, in H for j = 1,2. (4.23) 

Proof. Let us consider the same random perturbation 7 = R j — Rj and the same sets 
0„ as in the proof of Theorem 4.2. Let us also recall (4.21). It follows from Theorem A. 2 
that 

f*ln n - (/* + .1,:I7;;1<> + 0(T 2 l n J. (4.24) 
In the same manner as (4.22), we now obtain 

^{f*-f*)=A j yfa(R j -R j )f*+ p {l)^A j 'R j f* as rw oo, (4.25) 
which proves the theorem. □ 

5. Further specification of limiting distributions 

The distributions on the right in (4.19) and (4.23) contain unknown parameters that 
must be estimated for practical implementation. Let us first consider the variance in 



1190 J. Cupidon, D.S. Gilliam, R. Eubank and F. Ruymgaart 

(4.20). Substituting (4.6) or (4.7) yields 



5 5 



^ 2 =EE E <3w;> /;> <3w;>/;>- (5.1) 

k— 1 m— 1 

Subsequent substitution of the expressions for the Hjk shows, after reworking the inner 
products, that the expression for a 2 in (5.1) is a sum of terms of the type 

K{Qf,9){Gp,q), (5.2) 
where f,g,p,q£W depend on £ and where Q is given in (3.1). 

Lemma 5.1. If f, g, p, q <E H are known, we can express (5.2) as 

®{Sf,g) (5 P , q) = (q® P, Shs.9 ® />hs, (5.3) 
where Shs is the covariance operator of Q in (3.2). 

Proof. Let us assume that /, g, p, q ^ because, otherwise, (5.3) is trivial. Hence, we 
can construct two orthonormal bases of H, viz. e\, e%, . . . and di, d2, ■ ■ ■ , with 

ei = Yn, di = ipjj-. (5.4) 

11/11 Ibll 

Rewriting and evaluating the right-hand side of (5.3), we obtain 

(q®P, S H s5® /)hs =E (q <g> p, (^ <g> H s G)g®f)m 
= E(Q,g® f)ss(Q,q<^P}ns 

(5 ' 5) 

= e {£ (/ ' efc> ( ^ efc ' 5> }{f> {gdm > q) } 

as was to be shown. □ 



Since the /, g, p, q depend on S, we can replace them on the right in (5.3) with 
estimators obtained by substituting S for S. Also, Ehs is unknown and we may replace 
this operator with the estimator 

1 " 

Shs = — 5^ [{ {X% -X)® (X t - X) - £} ® HS {(^ - X) ® (X, - X) - £}]. (5.6) 

»=i 

Let us next turn to the Gaussian random element in H, on the right in (4.23). Sub- 
stitution of (4.6) or (4.7) shows that the covariance operator of this random element is 
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determined by covariances of the type 

5 5 

° 2 (f>9) -EE E(/,3W*)(3W;,.g) (5.7) 

fc=l m=l 

and this can be seen to be a sum of terms of type 

E(p,gf*)(gf;,q), (5.8) 

in the same way as above. In this case, explicit expressions for p and q involve the operator 
Aj and hence the unknown pjk and Qjk (see (4.18) and (4.15)). These quantities can be 
estimated by the corresponding quantities for "Rj and £hs can again be estimated by (5.6) 
so that, in principle, an estimator of (5.7) is available. An alternative to this estimation 
scheme could perhaps be formulated using resampling and bootstrap methods. We will 
not explore this idea further here. 



6. Example and some remarks 

The purpose of this paper is to establish some fundamental results regarding functional 
canonical correlations and their variates, at a fixed, but arbitrary, level of the rcgulariza- 
tion parameter a > 0. Although the question of how to choose this parameter in practice 
is certainly of great interest and relevance, it is not the main concern of this paper and 
would require a lengthy discussion of further theory and numerical simulations beyond 
the scope and purpose of this work. 

As a compromise, in this section, we present an explicit example that seems suitable 
for such simulations. It concerns two dependent standard Brownian motion processes 
that allow for canonical correlations in the entire range from to 1. To construct these 
processes, let 



l (t) = V2sm((m-^7tt\ i ti 



meN, (6.1) 



L(m-l/2)7tJ 

Let £j m be i.i.d. N(0, l)-random variables for m e N and j = 1,2. Choose a m , b m e R 
such that 



«m + & m = 1 > meN, (6.3) 



and define (j = 1,2) 



e m (t-(j-l))l b -_ M (t), 0<t, (6.4) 

(6.5) 



Ul m = VKi€lm, m£N 
U2m = V\n{ a m£lm + b m ^2m), meN 
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For both values of j, the 

Uj m axe independent N(0,\ n ), meN. (6.6) 

Obviously, 

oo 

Xj(t) = U jm e jm (t), < t < 2, (6.7) 

m— 1 

is the Karhunen-Loeve expansion of a standard Brownian motion, starting at t — for 
j = 1 and at t = 1 for j = 2. If we define 

X(t)=jri(t)+X a (t), 0<t<2, (6.8) 

then this process is a random function in H = L 2 (0, 2) and Xj can be considered as its 
projection onto Hj = L 2 (j — 
Because 



1km = Ef/lfcE^m — V AfeA m a m (5fc m , (6.9) 

a straightforward, but tedious, calculation (see 2006) shows that p 2 is the largest eigen- 
value of the diagonal matrix H with elements 

f a 2 k Xt , . 

%(k,j)={ (a + A fc ) 2 ' J ' (6-10) 



0, fc^j. 



If we assume that 



l>a\>a%>---, (6.11) 
then the largest eigenvalue of this matrix equals 

^ 2 = r4w- ( 6 - 12 ) 

Choosing a\ = yields X\ _LL X 2 and p 2 = 0, and choosing a\ close to 1 and a close to 
yields a p 2 close to 1. 

A sample of size n of processes can be obtained by generating n independent, suitably 
truncated sets of i.i.d. N(0, l)-random variables and R\ can, in principle, be numerically 
approximated, by first approximating X and X in (2.13). Finally, this should yield a 
specific value of p 2 and hence of \fn{ji 2 — p 2 ). This sampling process may be repeated 
N times. Each of the N runs yields a value of the standardized empirical canonical 
correlation and these values could be summarized in a histogram. All of this might be 
repeated for several values of the regularization parameter a > 0. Numerical procedures 
are available, but their implementation is rather involved. Apart from these simulations, 
some criterion should be formulated that yields an optimal value of a in theory, like the 
mean integrated square error for curve estimation. The entire issue of gaining insight into 
the choice of regularization parameter seems a topic of independent interest. 
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Appendix: Some perturbation theory 

In this appendix, wc briefly summarize some results from perturbation theory. A more 
general version of these results can be found in a technical report by Gilliam et al. (). 
In slightly different form, Theorem A. 2 and Theorem A. 3 can be found in Dauxois et al. 
(1982). Some monographs on perturbation theory for operators are Kato (1966), Rcllich 
(1969) and Chatelin (1983). For matrices, Theorem A.l can be found in Bhatia (2007). It 
has already been observed that the delta-method for functions of matrices can be found 
in Ruymgaart and Yang (1997). 

All operators considered here map the infinite dimensional, separable Hilbert space EI 
into itself. As in the main body of the paper, the inner product and norm in H will be 
denoted by (•, •) and || • ||, respectively, and we will use Ch to denote all bounded Hermi- 
tian operators on H, with Ch denoting the subspace of all compact Hcrmitian operators 
and C^j the subset of all strictly positive Hermitian operators. Without confusion, the 
operator norm will also be denoted by || ■ ||. 

Let T G C^j be arbitrary, but fixed. Such an operator has a spectral representation of 
the form 

oo 
3=1 

where Ai > A2 > • • • | are the distinct eigenvalues in decreasing order and Pi, P2, ■ . ■ are 
the projections onto the corresponding finite-dimensional cigcnspaccs. 

The operator T will be perturbed with a compact Hermitian operator CP £ Ch ■ For 
r > ,we will write 

o(imr) (a.2) 

to indicate any quantity (operator, vector, number) whose norm or absolute value is of 
the indicated order as | j CP 1 1 — »- . 

The perturbed operator T = T + CP is no longer strictly positive, but still T £ C# . This 
operator has spectral representation 

00 

3 n = ^A i P i , (A.3) 

i=i 

where Ai,A2,... are distinct nonzero eigenvalues such that |Ai| > | A2 1 > • • • I 0, and 
Pi, P2, ■ ■ ■ are the projections onto the corresponding finite-dimensional eigenspaces. 
Furthermore, let ip : D —> C be analytic on the open domain D C C, where 

D D [—e, \\T\\ + e] for some e > 0. (A.4) 



Theorem A.l. We have 

^(f)^(T) + ^ r CP + 0(||CP|| 2 ), (A.5) 
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where <p' T : Ch — * &h is bounded and given by 

= £ ip'(\j)PjTPj + £ E y(A . fc) :f Aj) P 3 m- (A.6) 

Remark A.l. The double sum in (A.6) is a correction term that is needed because the 
increment II <E Ch is arbitrary and therefore does not, in general, commute with T. This 
generality is needed for statistical application, as in Theorem 3.1; sec also Remark 3.1. 
If T and II do commute, however, then the double sum would disappear and we would 
obtain the much simpler expression 

^ T T = ^^(A j )P j y=(^'(T))T. (A.7) 

j>i 

In other words, in this case, the Frechet derivative tp' T equals the operator ip'(T), obtained 
by applying the usual functional calculus with the derivative <p' of ip; see also Dunford 
and Schwartz (), Theorem VII. 6. 10 for commuting operators. 

Theorem A. 2. If the range of P\ is one- dimensional so that P\ = Pi ® Px for some 
unit vector pi £ H, then there exists a unit vector p\ £ H such that P\=px® p\ for 7 
sufficiently small. We have, moreover, that 

p^Pr + AVpi + OiPf), (A.8) 

where A : H — > H is the bounded operator 

Theorem A. 3. If the range of P\ is one- dimensional and hence P\ =p\ ®p\ for some 
unit vector p\ £ H, then we have 

Ai = Ai + <y P i,pi) + 0(||3'|| 2 ). (A.10) 
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